Bitmap Indices for Speeding Up High-Dimensional Data Analysis

نویسنده

  • Kurt Stockinger
چکیده

Bitmap indices have gained wide acceptance in data warehouse applications and are an efficient access method for querying large amounts of read-only data. The main trend in bitmap index research focuses on typical business applications based on discrete attribute values. However, scientific data that is mostly characterised by non-discrete attributes cannot be queried efficiently by currently supported access methods. In our previous work [13] we introduced a novel bitmap algorithm called GenericRangeEval for efficiently querying scientific data. We evaluated our approach based primarily on uniformly distributed and independent data. In this paper we analyse the behaviour of our bitmap index algorithm against various queries based on different data distributions. We have implemented an improved version of one of the most cited bitmap compression algorithms called Byte Aligned Bitmap Compression and adapted it to our bitmap indices. To prove the efficiency of our access method, we carried out high-dimensional queries against real data taken from two different scientific applications, namely High Energy Physics and Astronomy. The results clearly show that depending on the underlying data distribution and the query access patterns, our proposed bitmap indices can significantly improve the response time of high-dimensional queries when compared to conventional access methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of Bitmap Indices for Scientific Data

Bitmap indices are efficient multi-dimensional index data structures for handling complex adhoc queries in readmostly environments. They have been implemented in several commercial database systems but are only well suited for discrete attribute values which are very common in typical business applications. However, many scientific applications usually operate on floating point numbers and cann...

متن کامل

Improving the Performance of High-Energy Physics Analysis through Bitmap Indices

Bitmap indices are popular multi-dimensional data structures for accessing read-mostly data such as data warehouse (DW) applications, decision support systems (DSS) and on-line analytical processing (OLAP). One of their main strengths is that they provide good performance characteristics for complex adhoc queries and an efficient combination of multiple index dimensions in one query. Considerab...

متن کامل

Optimizing I/O Costs of Multi-dimensional Queries Using Bitmap Indices

Bitmap indices are efficient data structures for processing complex, multi-dimensional queries in data warehouse applications and scientific data analysis. For high-cardinality attributes, a common approach is to build bitmap indices with binning. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rat...

متن کامل

FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science

FastBit is a software tool for searching large read-only datasets. It organizes user data in a column-oriented structure which is efficient for on-line analytical processing (OLAP), and utilizes compressed bitmap indices to further speed up query processing. Analyses have proven the compressed bitmap index used in FastBit to be theoretically optimal for onedimensional queries. Compared with oth...

متن کامل

Bitmap Indices for Data Warehouses

In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002